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Abstract 

Information flow or information transfer is an important concept in dynamical systems which 
has applications in a wide variety of scientific disciplines. In this study, we show that a rigorous 
formalism can be established in the context of a generic stochastic dynamical system. The resulting 
measure of of information transfer possesses a property of transfer asymmetry and, when the 
stochastic perturbation to the receiving component does not rely on the giving component, has 
a form same as that for the corresponding deterministic system. An application with a two- 
dimensional system is presented, and the resulting transfers are just as expected. A remarkable 
observation is that, for two highly correlated time series, there could be no information transfer 
from one certain series, say X2, to the other (xi). That is to say, the evolution of xi may have 
nothing to do with X2, even though xi and X2 are highly correlated. Information transfer analysis 
thus extends the traditional notion of correlation analysis by providing a quantitative measure of 
causality between time series. 
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Information transfer, or information flow as it is called, is an important concept in dy- 
namical systems and general physics which has been of interest since decades ago[H-[5|. 
Practical applications have been reported in fields like neurosciencejol] and atmosphere- 
ocean science [3], and are envisioned in the diverse disciplines such as turbulence research, 
material science, nanotechnology, to name a few, where ensemble forecasts are involved 
and predictability becomes an issue. Recently, Liang and Kleemanp!] j3] put this important 
concept on a rigorous footing in the context of deterministic dynamical systems. In this 
study, we will show that a rigorous formulation can also be obtained when the dynamical 
system is stochastic. We consider only two-dimensional (2D) systems; systems of higher 
dimensionality will be reported elsewhere, 

We start with a brief review of the work in ^ to educe the strategy for the building of 
our formalism for stochastic systems. Consider a 2D system 

- = F(x.«). (1) 

where F = (^1,^2), and the state variables x = {xi,X2) € M^. The randomness is limited 
within the initial condition. For notational simplicity, we do not distinguish random variables 
and deterministic variables, which should be clear in the context. (In probability theory, they 
are usually distinguished with lower and upper cases.) Let p be the joint probability density 
of Xi and X2, and suppose that it and its derivatives have compact support. Without loss of 
generality, consider the information transfer from X2 to Xi. We need the marginal density 
of xi, pi{t]Xi) = J^pdx2, and the marginal (Shannon) entropy. Hi = — J^pilogpi dxi. 
Hi varies as the system moves forward. Its variation is due to two different mechanisms, 
one due to Xi itself, written as another due to the transfer from X2- The latter is the 
very information transfer, which we will write as T2^i hereafter. The rate of information 
transfer from X2 to Xi is therefore the difference between and ^2^1 = — -^r- 
Among the terms on the right hand side, can be derived from the Liouville equation [9| 

corresponding to (1); the key is the derivation of the entropy change as xi evolves on 
its own. In [3|, this is achieved with the aid of a theorem established therein: The joint 
entropy of (xi,X2), H = — Jf^2 plogp dx., evolves as 



dH 



(2) 



Here the operator E is the mathematical expectation with respect to p. Liang and Kleeman 
then intuitively argued that 

-e('^), (3) 



dt \ dxi 

a result later on they rigorously proved, and hence obtained the transfer T2^i. 

The above formalism has been generalized to the information transfer within a determin- 
istic system of arbitrary dimensionality [3]; the key equation ([3]) has also been used to form 
the transfer between two subspacesjH]. The generalization, however, encounters difficulty 
when stochasticity is involved. Consider a system 

dlk = F(x, t)dt + B(x, t)ciw, (4) 
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where w = {wi,W2) is a standard 2D Wiener process "white noise"), B = [bij) the 
perturbation amphtude. There is no such elegant form as ([2]) for the evolution of H. One 
thus cannot obtain intuitively as ([3]) is obtained. 

But on the other hand, may be equally understood as the rate of change of the 
marginal entropy of Xi with the effect from X2 excluded. This alternative interpretation, 
as we used in j^, sheds light on the above problem. To reflect this interpretation, we will 
denote the term as ^^^^ henceforth, the subscript ^ signifying "x2 excluded". The rate of 
information transfer from X2 to Xi is thence 

_ dHi dHi^ 

-^t dF- 

Here the key issue is how to find which we will show shortly after the evaluation of 

dHi 
dt ■ 

Entropy evolution is related to density evolution. Corresponding to (jl]) there is a Fokker- 
Planck equation: j9(] 



dp ^ d{F,p) ^ d{F2p) ^l_y d\g,,p) 
dt dxi dx2 2 ^ dxidxi 



1 V- d'jgjjp) 



where Qij = gji = Ylk=i^ikbjk, hj = 1,2. This integrated over R with respect to X2 gives 
the evolution of pi. 



+ I ^^dX2 = ^ I ^^^dX2. (7) 



dp^^r d{F,p) ^^^ ^1 r d\gn p) 
dt dxi ^ 2 dxj 

Note in the derivation we have used the fact that p and its derivatives vanish at the bound- 
aries as they are compactly supported. For notational succinctness, we will henceforth 
suppress the integral domain M, unless otherwise noted. Multiplying ([7]) by — (1 + logpi) 
followed by an integration with respect to Xi over M, one obtains 

dH, ff , 1 /•/•, d'jgnp) , , 

~dr-jj ^^^^^^ = '2 J J ^'^p'^^ 

Integrating by parts, this is reduced to 

dH, r,f^d}ogpA I f d'\ogp, \ 



dt \ dxi J 2 \ dx\ 

where E stands for expectation with respect to p. 

The key part of this study is the evaluation of ifi^. Examine a small time interval 
\t,t + At]. Hi^ is the time rate of change of the marginal entropy of frozen as a 

parameter instantaneously at t. So one needs to consider a system on [t, t + At] suddenly 
modified at time t from that prior to t. Clearly, ifi^ cannot be derived from the Fokker- 
Planck equation ([7]), where the dynamics is consistent through time. One has to go back to 
the definition of derivative to achieve the goal. Let the marginal entropy evolved from t to 
t + At with X2 frozen at t be Hi^{t + At). We then have 

dH^^ H^^{t + /\t) - Hi{t) 

— = — n 5 

dt At^o At 
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and the whole problem now boils down to the derivation of Hi-^{t + At). In we discretize 
the deterministic equation (1) and evaluate the Frobenius- Perron operator for the discretized 
system to compute the modified marginal entropy. For the stochastic system (jlj), however, 
there is no such a simple operator. We need a different approach for the problem. 

Denote by the first component after X2 is fixed as a parameter. The stochastic system 
(jl]) is changed to 

dxi)^ = Fi{xi'^,X2,t)dt + '^bikdwk, oia[t,t + At], (9) 

k 

= xi at time t. (10) 
Correspondingly the density pi^ evolves following the following Fokker-Planck equation 

pi^ = pi at t, (12) 

where = "^Zk^lk- Recall by definition, the Shannon entropy may be understood as the 
expectation of a function of the state variable formed by minus logarithm composite with 
its density. This motivates one to introduce a function of Xi, ft{xi) = logpi^(t, Xi), whose 
evolution is obtained by dividing (ITT]) by pi^: 

dft ^ 1 dFipi^ _ 1 ^^gllPl'^ 
dt pi^ dxi pi^ dxj 

In a discretized version, this is 

/..A.(xi) = /.(xi) - + + )' 

where the fact pi^ = pi at time t has been used. (Functions without arguments explicitly 
written out are supposed to be evaluated at Xiit).) So 

+ AO) = + At)) - + j~§r + 

The Xi^(t + At) in the argument can be expanded by the Euler-Bernstein approximation 
of (ED: 

xi^(t + At) = xi{t) + FiAt + ^ feifeAwfo + /i.o.t. 

k 

Substituting back and performing Taylor series expansion, we get 

ft-^U^nit + At)) = n (xi + FiAt + X:6,.A^,^ - + ^^^^ + 0(At^ 



,2. ' 
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At^{F^p,) ^ At d\g^^p{] 



Pi dxi 2pi dxf 



0{Af 



(13) 



Take expectation on both sides, the left hand side is —Hi^{t + At), and the first term on 
the right hand side is —Hi{t). Note that Awk ~ A^(0, At) for a Wiener process Wk- So 



EAwk 

The second term on the r.h.s. is 

Oft 



0, E{AwkY = ^i- 



At-E\Fi 



' dxi 



+ E 



At-E{Fi 



dxi 



where we have used the fact that Awk is independent of {xi,X2), and hence expectation 
can be taken inside directly with Awk, which eliminates E (^§§^ Ylk bikAwk^ . For the same 
reason, the third term after expansion leaves only one sub-term of order At, namely. 



2 



dxj 



^ hkAwk ^ bijAwj 



2 



dxi 



^ h\^{AwkY + ^ bikhjAwkAwj 



Recall that the perturbations are independent. The summation over k^j inside the paren- 
theses thus vanishes after expectation is performed. The first summation is equal to gnAt, 

by the definition of gij and the fact E{Awky = At. So the whole term is ^E gii^^ 

With all these put together, expectation of (fT3l) gives (note ft = logpi^(t;xi) = logpi) 



ifi^(t + At) = Hi{t) -At-EiFi 



+At ■ E 



1 djF.p, 
Pi dxi 



d log pi \ At 



At 



E 



dxi 
1 d\gnpi 
pi dx\ 



-E \gi 
0{At^ 



d'^ log pi 

dx\ 



The second and fourth terms on the right hand side can be combined to give At ■ E ^ff^^ 
So 



dt 



lim 



Hi^{t + At) - Hi{t) 



E 



dFi 
dxi 



At 

\e[9 



iir 



d" ^ log Pi 
dxi 
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1 d\giipi 
pi dxi 



(14) 



In the equation, the second and the third terms on the right hand side are from the stochastic 
perturbation. The first term is precisely ([3]), the key result obtained in [3] through intuitive 
argument based on the theorem ([2]). The above derivation supplies a proof of this argument. 
The information transfer from X2 to Xi is obtained by subtracting ( fT4l) from ([8]): 



where E is the expectation with respect to p(xi,X2). Notice that the conditional density of 
X2 on xi, p2\i, is p/pi- If we write the expectation with respect to p2\i as £'211, the above 
formula may be further simplified: 




(16) 

This is the transfer from X2 to Xi. Likewise, the transfer from xi to X2 can be obtained: 

where P2 = / p cixi is the marginal density of X2- 

Among the two terms of f|T6|) the first is the same in form as the information trans- 
fer obtained in [31 for the corresponding deterministic system. The contribution from the 
stochasticity that modifies the formula is in the second term. An interesting observation 
is that, if Qii = J2k^ik is independent of X2, this term vanishes. To see this, notice that 
/ P2\idx2 = 1, which results in 

^^1^ [-^xT)^ 1 

Wee thus have the following property: 

Given a stochastic system component, if the stochastic perturbation is independent of 
another component, then the information transfer from the latter is the same in form 
as that for the corresponding deterministic system. 

This property is interesting since a large proportion of noise appearing in real problems are 
additive, that is to say, and hence gij, are often constant. This theorem shows that, in 
terms of information transfer, these stochastic systems function like deterministic. But, of 
course, the similarity is just in form; they are different in value. The first part on the right 
hand side of f|T6l) actually has stochasticity embedded in the marginal density. 

Another property is the concretization of the requirement of transfer asymmetry empha- 
sized by Schreiber: 

If the evolution of Xi is independent of X2, then T2^i is zero. 

In fact, if neither Fi and gu have dependency on X2, the integrals in ([7]) can be evaluated 
and the whole equation becomes a Fokker-Planck equation for pi. In this case, xi behaves 
like an independent variable. So by intuition, there should be no information flowing from 
X2. This is indeed true by formula (1161) . If Fi = Fi{xi), integration can be made for p2|i 
with respect to X2 inside the double integral, giving a zero T2_+i. 

The formulas ffTBl) and ffTTI) are expected to be applicable in a wide variety of fields. To 
demonstrate an application, consider a 2D linear system: 

= A^dt + ^dw, (18) 
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where A = (aij) and B = {bij) are constant matrices. Further suppose that x has an initial 
Gaussian distribution; it is then Gaussian all the time[loj, with a mean fi = [fii, ^2)^ and a 
covariance matrix C = (q-,) evolving as 

dn/dt = A (19a) 
dQ/dt = AQ + QA^ (19b) 
The solution of these equations determines the density 

which after substituted into (fT6|) and (fT7|) gives the transfers between xi and X2- 

For an example, let all the entries of B be 1, and an = 022 = —0.5, ai2 = 0.1, leaving 
021 open for experiment. First consider 021 = 0. It is easy to show that this system has an 
equilibrium solution: /i = (0,0), Cn = 2.44, Cu = C21 = 2.2, C22 = 2, whatever the initial 
conditions are. Fig. [T^ shows the time evolutions of fi and C initialized with //(O) = (1, 2) 
and Cii(O) = 022(0) = 9, Ci2(0) = C2i(0) = 0; also shown is a sample path of x starting from 
/i(0). In this system, F2 = —0.5x2 has no dependence on xi, and gij = Ylik^ikbjk are all 
constants, so Ti^2 = by the property established above. The computed result confirms 
this inference. In Fig. [T]d, Ti^2 is zero through time. The other transfer, T2^i, increases 
monotically and eventually approaches a constant. 




FIG. 1: (a) A solution of (jl9p with 021 = 0: (thick solid), C (dotted). Also shown is a 
sample path (solid) starting from /u(0). (b) The computed information transfers T2^i (upper) and 
Ti^2 = 0. 



An interesting observation about the typical sample path in Fig. [T^ is the high correlation 
between x\ and X2-, in contrast to the zero information transfer Ti^2- That is to say, even 
though Xiit) and X2{t) are highly correlated, the evolution of X2 has nothing to do with 
Xi- Through this simple example one sees how information transfer extends the traditional 
notion of correlation analysis and/or mutual information analysis by including causality (lll|. 
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In the second experiment, we let 021 = 0.1 = ai2, resulting in a system symmetric between 
Xi and X2- One thus naturally expects two transfers equal in value. The computed results 
show that this is indeed so. The transfer T2^i is equal to Ti^2 (not shown). (If fii 7^ /i2, 
initially they may be different, but merge together soon after the transient period.) In the 
third experiment, 021 = 0.2 > 012; the influence of Xi on X2 is larger than that of X2 on Xi, so 
one expects a larger Ti_,2 than T2_»i. Again, the computed result agrees with the inference 
(not shown). The formulas (ITB]) and fll7p are verified with this example. 

We have rigorously established a formalism of information transfer within 2D stochastic 
dynamical systems, which is measured by the rate of entropy transferred from one compo- 
nent to another. The measure possesses a property of transfer asymmetry and, when the 
stochastic perturbation to the receiving component does not rely on the giving component, 
has a form same as that for the corresponding deterministic system. An application with a 
linear system has been presented, from which one sees that correlation does not necessarily 
mean causality; for two highly correlated time series, the one-way information transfer could 
be zero. Information transfer provides a quantitative way of establishing the causal relation 
between dynamical events. This quantification of causality is expected to have important 
applications in a wide variety of scientific disciplines. 

The author has benefited from several important scientific discussions with Richard Klee- 
man on this subject. He also read through an early version of the manuscript, and his 
comments are greatly appreciated. 
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